AITopics | global attention

Collaborating Authors

global attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tailoring Self-Attention for Graph via Rooted Subtrees

Neural Information Processing SystemsApr-30-2026, 04:07:08 GMT

Attention mechanisms have made significant strides in graph learning, yet they still exhibit notable limitations: local attention faces challenges in capturing long-range information due to the inherent problems of the message-passing scheme, while global attention cannot reflect the hierarchical neighborhood structure and fails to capture fine-grained local information. In this paper, we propose a novel multihop graph attention mechanism, named Subtree Attention (STA), to address the aforementioned issues. STA seamlessly bridges the fully-attentional structure and the rooted subtree, with theoretical proof that STA approximates the global attention under extreme settings.

machine learning, natural language, stagnn, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
North America > United States > California (0.28)

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

cc57fac10eacadb3b72a907ac48f9a98-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 19:18:56 GMT

artificial intelligence, graph, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.67)

Industry:

Leisure & Entertainment (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

f1f9962f76581ce8bf38d04c6d6c96b1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 16:23:00 GMT

amm, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)

Add feedback

Tailoring Self-Attention for Graph via Rooted Subtrees

Neural Information Processing SystemsFeb-17-2026, 18:01:28 GMT

However, local attention limits the receptive field to one-hop neighbors.

machine learning, natural language, st agnn, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Austria (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(8 more...)

Genre: Research Report (0.46)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations Qitian Wu

Neural Information Processing SystemsFeb-17-2026, 03:35:47 GMT

Transformers may have sufficient supervision for generalization.

artificial intelligence, graph, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Europe (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.67)

Industry:

Leisure & Entertainment (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

cf78a15772ec1a6aee9bbee2d2b382c3-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 00:47:28 GMT

Our first step is to prove the parameterization (Eq. 3) provides local attention after the Note that the weight and bias terms in theaboveformulation (Eq. Assume the position-based function at each head is learned to perform'hard attention' on one of its surrounding positions,i.e., an extreme semi-dynamic attention. To demonstrate this phenomenon, we plot and compare the impacts ofΦc and Φp6 on Φa in the middle and right of Fig. S4 and visualize learned position-based attentionΦp of iRPE in Fig. S5. As seen from Tab. S17, there exist noticeable performance gaps between the models (b, f, g, h) (withoutΦp)and(a,d,e,i)(withΦp). Without adaptiveattention (model (c)),Φp imposes stronger locality onevery layer.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

An A-Z list of 2025's biggest stories

Al JazeeraDec-31-2025, 06:27:45 GMT

Scroll back through the last year, and the same words come up again and again. The top-trending terms of 2025, from artificial intelligence to Zohran Mamdani, shaped headlines across politics, conflict, technology and climate. As the year comes to a close, AJ Labs has compiled an A to Z list of names, places and issues that generated sustained interest throughout 2025, according to a loose analysis of our own most-viewed story tags and those that appeared in Google's most searched. Taken together, these terms are a patchwork of issues that are also likely to spill into 2026, from ongoing conflicts to a changing technosocial landscape not seen since the dawn of the internet. This is 2025 from A to Z, by the words that made the year.

conflict, israel, nuclear facility, (15 more...)

Al Jazeera

Country:

Asia > Middle East > Israel (0.50)
Asia > Middle East > Iran (0.16)
Europe > France (0.14)
(32 more...)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)
Government > Immigration & Customs (0.95)
(5 more...)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Representing Long-Range Context for Graph Neural Networks with Global Attention

Neural Information Processing SystemsDec-24-2025, 06:46:43 GMT

Graph neural networks are powerful architectures for structured datasets. However, current methods struggle to represent long-range dependencies. Scaling the depth or width of GNNs is insufficient to broaden receptive fields as larger GNNs encounter optimization instabilities such as vanishing gradients and representation oversmoothing, while pooling-based approaches have yet to become as universally useful as in computer vision. In this work, we propose the use of Transformer-based self-attention to learn long-range pairwise relationships, with a novel "readout" mechanism to obtain a global graph embedding. Inspired by recent computer vision results that find position-invariant attention performant in learning long-range relationships, our method, which we call GraphTrans, applies a permutation-invariant Transformer module after a standard GNN module. This simple architecture leads to state-of-the-art results on several graph classification tasks, outperforming methods that explicitly encode graph structure. Our results suggest that purely-learning-based approaches without graph structure may be suitable for learning high-level, long-range relationships on graphs.

graph neural network, name change, representing long-range context, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.63)

Add feedback

RATTENTION: Towards the Minimal Sliding Window Size in Local-Global Attention Models

Wang, Bailin, Lan, Chang, Wang, Chong, Pang, Ruoming

arXiv.org Artificial IntelligenceNov-18-2025

Local-global attention models have recently emerged as compelling alternatives to standard Transformers, promising improvements in both training and inference efficiency. However, the crucial choice of window size presents a Pareto tradeoff: larger windows maintain performance akin to full attention but offer minimal efficiency gains in short-context scenarios, while smaller windows can lead to performance degradation. Current models, such as Gemma2 and Mistral, adopt conservative window sizes (e.g., 4096 out of an 8192 pretraining length) to preserve performance. This work investigates strategies to shift this Pareto frontier, enabling local-global models to achieve efficiency gains even in short-context regimes. Our core motivation is to address the intrinsic limitation of local attention -- its complete disregard for tokens outside the defined window. We explore RATTENTION, a variant of local attention integrated with a specialized linear attention mechanism designed to capture information from these out-of-window tokens. Pretraining experiments at the 3B and 12B scales demonstrate that RATTENTION achieves a superior Pareto tradeoff between performance and efficiency. As a sweet spot, RATTENTION with a window size of just 512 consistently matches the performance of full-attention models across diverse settings. Furthermore, the recurrent nature inherent in the linear attention component of RATTENTION contributes to enhanced long-context performance, as validated on the RULER benchmark. Crucially, these improvements do not compromise training efficiency; thanks to a specialized kernel implementation and the reduced window size, RATTENTION maintains training speeds comparable to existing state-of-the-art approaches. We open-sourced our Pallas kernels along with model codes to facilitate further research effort.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.15545

Genre: Research Report > New Finding (0.93)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Neural Information Processing SystemsOct-10-2025, 21:17:27 GMT

Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational requirements hinder widespread practical applications.

amm, training strategy, transformer, (14 more...)

Neural Information Processing Systems

Country: